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SMOOTH  NONPARAMETRIC  QUANTILE  ESTIMATION  UNDER 
CENSORING:  SIMULATIONS  AND  BOOTSTRAP  METHODS 


V.  J.  Padgett  and  L.  A."  Thombs 

Department  of  Statistics 
University  of  South  Carolina 
Columbia,  South  Carolina  29208 


ABSTRACT 

Based  on  right-censored  data  from  a  lifetime  distribution  F  , 

a  smooth  nonparametric  estimator  of  the  quantile  function  Q°(p)  is 
-11“ 

given  by  Qn(p)=h  J0Qn( t )K( ( t-p)/h)d t ,  where  Qn(p)  denotes  the 
product-limit  quantile  function.  Extensive  Monte  Carlo  simula¬ 
tions  indicate  that  at  fixed  p  this  kernel-type  quantile  estimator 
has  smaller  mean  squared  error  than  Qn(p)  for  a  range  of  values  of 
the  bandwidth  h.  A  method  of  selecting  an  "optimal"  bandwidth  (in 
the  sense  of  small  estimated  mean  squared  error)  based  on  the 
bootstrap  is  investigated  yielding  results  consistent  with  the 
simulation  study.  The  bootstrap  is  also  used  to  obtain  interval 
estimates  for  Q°(p)  after  determining  the  optimal  value  of  h. 


1.  INTRODUCTION 

Arbitrarily  right-censored  data  arise  naturally  in  industrial 
life  testing  and  medical  studies.  In  these  situations  it  is 
important  to  be  able  to  obtain  good  nonparametric  estimates  of 
various  characteristics  of  the  unknown  lifetime  distribution.  One 
characteristic  of  the  lifetime  distribution  that  is  of  interest  is 
the  quantile  function ■  For  right-censored  data,  Sander  (1975) 


proposed  estimation  of  the  quantile  function  by  the  product-limit 

quantile  estimator,  defined  by  Q  sF~\  where  F  denotes  the 

n  n  n 

product-limit  estimator  of  the  lifetime  distribution  function  Fq 
(Kaplan  and  Meier,  1958;  Efron,  1967).  Sander  (1975)  and  Cheng 
(1984)  obtained  some  asymptotic  properties  of  Qn,  and  Csorgo 
(1983)  discussed  strong  approximation  results. 

The  product-limit  quantile  estimator  is  a  step  function  with 
jumps  corresponding  to  the  uncensored  observations.  A  smoothed 
nonparametric  estimator  of  the  quantile  function  from  right- 
censored  observations  based  on  the  kernel  method  was  proposed 
by  Padgett  (1986),  extending  the  complete  sample  results  of  Yang 
(1985).  Lio,  Padgett,  and  Yu  (1986)  and  Lio  and  Padgett  (1985) 
studied  some  of  the  asymptotic  properties  of  this  kernel 
estimator,  including  asymptotic  normality  and  mean  square 
convergence. 

In  general,  the  effective  performance  of  nonparametric 
function  estimators  is  critically  dependent  on  the  choice  of  a 
"smoothing  parameter."  If  not  enough  smoothing  is  done,  the 
estimate  will  be  "rough,"  showing  features  which  do  not  represent 
the  function  being  estimated.  On  the  other  hand,  if  too  much 
smoothing  is  done,  important  features  of  the  curve  may  not  show  up 
since  they  are  essentially  smoothed  away  (Marron,  1986).  For 
kernel-type  estimators,  the  smoothing  parameter  is  often  called 
the  "bandwidth,"  and  an  important  question  that  arises  is:  Given 
a  set  of  data,  what  value(s)  of  the  bandwidth  are  best  to  use  in 
calculating  the  smooth  estimator  in  the  sense  of  minimum  mean 
squared  error,  or  with  respect  to  some  other  criterion? 

The  objectives  of  this  paper  are  two-fold.  One  is  to  report 
results  of  extensive  Monte  Carlo  simulations  which  demonstrate  the 
behavior  of  the  mean  squared  error  of  the  kernel  estimator  with 
respect  to  bandwidth.  These  simulations  provide  a  method  of 
choosing  an  optimal  bandwidth  when  the  form  of  the  lifetime  and 
censoring  distributions  are  known.  Also,  they  compare  the  kernel- 
type  estimator  with  the  product-limit  quantile  estimator.  Five 
commonly  used  parametric  lifetime  distributions,  two  censoring 
mechanisms,  and  four  different  kernel  functions  are  considered  in 
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this  study,  which  is  an  extension  of  the  brief  simulations  for 
exponential  distributions  reported  by  Padgett  (1986). 

The  second  objective  is  to  present  a  nonparametric  method  for 
bandwidth  selection  based  on  the  bootstrap  for  right-censored 
data.  This  data-based  procedure  uses  the  bootstrap  to  estimate 
mean  squared  error,  and  is  both  an  extension  and  modification  of 
the  methods  proposed  by  Padgett  (1986).  Bandwidth  selection 
using  the  bootstrap  is  important  for  small  and  moderately  large 
samples  since  no  exact  expressions  exist  for  the  mean  squared 
error  of  the  kernel-type  quantile  estimator.  -  Lio  and  Padgett 
(1985)  obtained  an  upper  bound  on  the  mean  square  convergence  rate 
of  the  kernel  quantile  estimator  under  random  right-censorship, 
but  the  bound  is  not  sharp  and  does  not  readily  lead  to  an 
optimal  choice  of  the  bandwidth  in  the  sense  of  minimum  mean 
squared  error.  The  bootstrap  also  provides  a  method  of  obtaining 
confidence  intervals  for  the  unknown  quantiles  or  more  generally, 
confidence  bands  for  the  quantile  function.  Two  such  intervals 
are  presented  in  this  paper. 

In  Section  2,  the  estimators  and  some  of  their  asymptotic 
properties  are  discussed.  The  simulation  results  are  reported  in 
Section  3.  The  bootstrap  bandwidth  selection  procedure  is 
presented  in  Section  4,  and  some  examples  are  given  which 
indicate  that  the  data-based  nonparametric  method  yields  optimal 
bandwidths  which  are  consistent  with  the  Monte  Carlo  results  of 
Section  3.  In  Section  5,  two  bootstrap  confidence  interval 
procedures  are  presented  along  with  some  convergence  results  which 
provide  asymptotic  validity  for  the  bootstrap  in  this  setting  of 
quantile  estimation. 

2.  NOTATION  AND  PRELIMINARIES 

Let  X?,...,X°  denote  the  true  survival  times  of  n  items  or 
1  n 

individuals  that  are  censored  on  the  right  by  a  sequence 
Uj,...,Un,  which  in  general  may  be  either  constants  or  random 
variables.  The  X?'s  are  nonnegative,  independent,  identically 
distributed  random  variables  with  common  unknown  distribution 
function  Fq  and  unknown  quantile  function  Q°(p) =F~^(p)= 


A 


=inf  { t : Fq( t ) >p} ,  0<p<l. 

The  observed  right-censored  data  are  denoted  by  the  pairs 
(X., A. ),  i=l| • • . ,n,  vhere 


Xd  =  min{X? ,  U.}, 


A. 

1 


1  if  X?  <  U. 
1-1 

0  if  X?  >  U.  . 

l  l 


Thus,  it  is  known  which  observations  are  times  of  failure  or 


death  and  which  ones  are  censored  or  loss  times.  The  nature  of 


the  censoring  depends  on  the  Ih's.  (i)  If  U^,...,U  are  fixed 
constants,  the  observations  are  time-truncated.  If  all  Ik's  are 
equal  to  the  same  constant,  then  the  case  of  Type  I  censoring 
results,  (ii)  If  all  U.  =  X?  .,  the  rth  order  statistic  of 
X°,...,X°,  then  the  situation  is  that  of  Type  II  censoring.  ( i i i ) 
If  Uj,...,U  constitute  a  random  sample  from  a  distribution  H 
(usually  unknown)  and  are  independent  of  X°,...,X°,  then  (X^,A^.), 
i=l,2,...,n,  is  called  a  randomly  right-censored  sample. 

For  the  asymptotic  results  of  Padgett  (1986),  Lio,  Padgett, 
and  Yu  (1986),  and  Lio  and  Padgett  (1985),  the  random  censorship 
model  (iii)  was  assumed.  For  this  model  the  distribution  function 


of  each  X i  is  F=1-(1-Fq)(1-H) . 

A  popular  estimator  of  the  survival  function  l-Fo(t)  from  the 

censored  sample  (X^,/Y),  i=l,...,n,  is  the  product-limit  (PL) 

estimator  of  Kaplan  and  Meier  (1958).  The  PL  estimator,  which  was 

shown  to  be  "self-consistent"  by  Efron  (1967),  is  defined  as 

follows.  Let  (Z^,A^),  i=l,...,n,  denote  the  ordered  X^'s  along 

with  their  corresponding  A^'s.  Values  of  the  censored  sample  will 

be  denoted  by  the  corresponding  lower  case  letters,  (x^,6^)  and 

(z^,X..),  for  the  unordered  and  ordered  sample,  respectively.  Then 

the  PL  estimator  of  1-F  (t)  is 

o 


The  PL  estimator  of  F  (t)  is  denoted  by  F  (t)=l-P  (t),  and  the 

size  of  the  jump  of  P  (or  F  )  at  Z.  is  denoted  by  s . .  Note  that 
J  p  n  v  n  j  J  3 


rr^  l  v  V  K\<TKm.  r.”  ^  >,•  V  v,  v  v.  vuc-jr. vt  T  r.~VL*r. 


K 


I 


I 


s.=0  if  and  only  if  Z.  is  censored  for  j<n,  i.e.  if  and  only  if 
J  i  J 


and  S  si. 
n 


X,  =0 .  Define  S.  =  Z  s.  =  F  (Z.  ., ) ,  i=l,...,n-l, 

1  1  .  .  j  nv  i+l'’  ’  ’  ’ 

J  3=1  J 

A  natural  estimator  of  Q°(p)  is  the  PL  quantile  function 
Qn(p)=inf {t:Fn(t)>p)  (see,  for  example,  Sander  (1975),  Cheng 
(1984),  and  Csorgo  (1983)  for  some  of  the  properties  of  Q^). 

Since  is  a  step  function  with  jumps  corresponding  to  the 
uncensored  observations,  it  is  desirable  to  obtain  a  smoothed 
estimator  of  Q°.  The  kernel  smoothed  Qfi,  considered  by  Padgett 
(1986),  Lio,  Padgett  and  Yu  (1986),  and  Lio  and  Padgett  (1985),  is 
such  an  estimator,  and  is  defined  as  follows:  Let  {hsh^}  be  a 
"bandwidth"  sequence  of  positive  numbers  so  that  h  -K)  as  n-*00,  and 
let  K  be  a  bounded  probability  density  function  which  is  zero 
outside  a  finite  interval  (-c,c)  and  is  symmetric  about  zero. 

(For  asymptotic  results,  other  conditions  on  h^,  K,  and  Fo  are 
needed,  but  these  are  the  only  assumptions  that  will  be  made 
here.)  Then  for  0<p<l,  the  kernel  quantile  function  estimator  is 
given  by 

Qn(p)  =  h-1jjQn(t)K((t-p)/h)dt 


h  1  I  Z. J  1  K((t-p)/h)dt, 

i=l  1  l-l 


(2.1) 


where  S^sO.  An  approximation  to  Qn(p)  was  given  by  Padgett  (1986) 


as 


-1  n 

Q  (p)=h  Z  Z  s.K((S.-p)/h), 

n  i=1  li  l 


(2.2) 


Although  neither  estimator  is  difficult  to  compute,  (2.2)  will  be 
simpler  for  some  kernel  functions. 

The  asymptotic  normality,  asymptotic  equivalence,  and  mean 
square  convergence  of  (2.1)  and  (2.2)  were  studied  by  Lio,  Padgett 
and  Yu  (1986)  and  Lio  and  Padgett  (1985).  However,  no  small 
sample  properties  have  been  derived.  In  fact,  due  to  the 
mathematical  complications  introduced  by  censoring,  an  exact 
expression  for  the  mean  squared  error  of  Qn(p)  for  small  n  is  not 
available.  Thus,  a  bandwidth  value  minimizing  the  exact  mean 
squared  error  of  Qn(p)  cannot  be  obtained.  A  practical  method  for 
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selecting  bandwidth  is  discussed  in  Section  4.  First,  in  the  next 
section,  a  large  simulation  study  is  reported  which  gives  compar¬ 
isons  of  Q  and  Q  with  Q  and  gives  an  indication  of  the  behavior 
n  n  n 

of  these  estimators  with  respect  to  the  bandwidth  values,  the 
censoring  mechanism,  the  kernel  function,  and  sample  size,  using 
the  mean  squared  error  criterion. 

3.  COMPARISON  OF  ESTIMATORS:  SIMULATION  RESULTS 

A  Monte  Carlo  simulation  was  performed  for  five  families  of 
lifetime  distributions  that  are  commonly  used  in  life  testing. 
These  distributions  are  shown  in  Table  1.  Two  censoring 
distributions  H  were  used:  exponential  with  density  h(u)  =  Xe~  , 
u>0,  X>0,  and  uniform  on  the  interval  (0,X),  X>0.  In  addition, 
three  different  kernel  functions  were  chosen  as  K1(x)=l-|xl, 

Jx | <1  (triangular),  1<2(x)=3/4(1-x  ),  | x | <1  (quadratic),  and 

K^(x)  =  l,  | x | <0 - 5  (uniform).  Also,  the  uniform  kernel  on  [-1,1] 
was  used,  producing  similar  results  as  K^. 

TABLE  1.  Lifetime  Distributions  Used  in  Simulations 


Distribution  Density 

Exponential  f (x)  =  Pexp(-px) ,x>0 


a 

'x'loc-l 

" 

fxW 

Veibull  f(x)=^ 

P 

IpJ  exp 

- 

,pJ  . 

x>0 


Gamma 


Lognormal 


Inverse 

Gaussian 


f(x)  = 


F( «) 


-xa  ^exp(-x/ P) , 


x>0 


f(x)  = 


(2rt)V 


GXp 


(log  x-a) 


2. 


!(32 


f(x)  = 


X  ) 


x>0 

% 


l2nx3J 


exp 
x>0 


X(x-p) 

2m2x 


2-, 


Notation 


E(P):p=l 


V(«,  p): 

(a,  P)  =  (0.5,1),  (2,1), 
(2,5) 


G(a,  P) : 

(a,  P)  =  (0.5,1), (2,1), 
(2,5) 


L(  a,  P)  : 

(a, p)=(0,l), (2,0.5) 


IG(p, X) : 

( J-1 ,  X)  =  (1,0.25), 
(3,1) 


The  parameter  X  of  the  censoring  distribution  was  determined 
to  give  either  30%  or  50%  censoring.  That  is,  X  was  determined 


7 


so  that  the  probability  of  a  censored  observation,  Pr(X°>U)=0.3 
or  0.5,  at  least  approximately.  This  probability  was  calculated 
by  numerical  integration  using  the  midpoint  rule  when  it  could  not 
be  obtained  exactly.  The  value  of  X  is  reported  in  the  resulting 
table  for  each  case. 

Bandwidth  values  of  h=0.01  (.02)  0.61  were  used  for  quantiles 
at  p  =  0.10,  0.25,  0.50,  0.75,  0.90,  and  0.95.  Sample  sizes  of 
n=20,  50,  100,  and  300  were  chosen,  although  only  the  results  for 
n-50  and  100  are  shown  in  the  tables  presented  here  for  brevity. 

In  each  case  simulated  (i.e.  each  distribution,  kernel, 
bandwidth,  p,  and  sample  size  combination),  1000  censored  samples 
were  generated  using  the  random  number  generators  in  the 
International  Mathematical  and  Statistical  Library  (IMSL,  1982)  on 
an  IBM  370  computer.  In  particular,  for  uniform  random  number 
generation,  the  IMSL  subroutine  GGUBS  was  used;  for  exponential 
random  numbers,  GGEXN;  for  Veibull,  GGUIB;  for  gamma,  GGAMR;  and 
for  lognormal,  GGNLG.  For  generating  random  numbers  from  the 
inverse  Gaussian  distribution,  the  method  discussed  in  Michael, 
Schucany,  and  Haas  (1976)  was  utilized.  From  the  1000  samples, 
the  estimated  mean  squared  errors  (MSE)  of  the  estimators  Qn(p), 
Qn(p)  and  Qn(p)  were  computed,  and  the  ratios  of  these  estimated 
mean  squared  errors,  A=(MSE  (WMSE  Qn)  and  B=(MSE  Q^/MSE  Qn)»  were 
calculated . 

Some  of  the  results  of  the  simulations  are  shown  in  Tables  2- 
9.  In  each  case,  for  each  p,  there  is  a  range  of  bandwidth  values 
for  which  Qn(p)  has  smaller  estimated  mean  squared  error  than  that 
of  Qn(p).  The  same  behavior  of  Qn  was  observed,  except  that  the 
best  bandwidth  values  were  generally  larger  than  those  for  Q^, 
indicating  that  more  smoothing  is  required  for  Qn- 

Parzen  (1979)  indicated  that  kernel  estimators  of  quantile 
functions  do  not  generally  give  good  estimates  for  p  near  zero  or 
one  since  quantile  functions  are  usually  nonintegrable.  This  is 
quite  noticeable  for  Q  and  Q  in  Tables  2-9  for  p  near  one, 
although  for  some  values  of  h,  is  still  better  than  the  PL 
quantile  estimator.  Also,  it  should  be  noted  that  as  h-O  for 
fixed  n,  Qn(p)->Qn(p)  and  hence  the  ratios  of  mean  squared  errors 
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TABLE  5.  RATIOS  OF  MEAN  SQUARE  ERRORS  WITH  TRIANGULAR  KERNEL 

LIFE  DISTRIBUTION:  E(l)  ,  CENSORING  DISTRIBUTION:  U(0,  3.1941) 
n  =  100  (APPROX.  301  CENSORING ) 

h 


0.01 

0.05 

0.11 

0.15 

0.21 

0.25 

0 . 31 

0.35 

0 .41 

0.45 

0 . 51 

0.55 

0.61 

10 

A 

1.02 

1.15 

1.33 

1.43 

1  .33 

1  .13 

0.77 

0 . 58 

0 .38 

0 . 29 

0 . 20 

0.16 

0.11 

B 

1.10 

1.27 

1.50 

1.65 

1.65 

1.43 

0.98 

0.73 

0.46 

0.34 

0.23 

0.18 

0.13 

25 

A 

1.02 

1.07 

1.16 

1.21 

1  .28 

1.31 

1.29 

1.23 

1.04 

0.89 

0.65 

0 .52 

0 . 36 

B 

1.04 

1.14 

1.23 

1.30 

1.40 

1.46 

1.49 

1.46 

1.29 

1.11 

0.83 

0.66 

0.46 

50 

A 

1.01 

1.05 

1.12 

1.15 

1.16 

1.15 

1 . 07 

0.98 

0.79 

0.65 

0.48 

0.43 

0 .45 

B 

0 .48 

1.09 

1.18 

1.24 

1 . 30 

1.32 

1.32 

1.28 

1.20 

1.15 

1.07 

0.86 

0 . 80 

75 

A 

1.02 

1.08 

1  .11 

1.10 

1.03 

0 . 97 

1.17 

1.52 

1.81 

1.52 

0.96 

0.70 

0.46 

B 

0.08 

1.23 

1.46 

1.59 

1.66 

1.43 

2.24 

2.20 

1 .64 

1  .  20 

0.75 

0.57 

0.40 

90 

A 

1.07 

1.35 

1.90 

1.76 

0.73 

0 . 46 

0.29 

0.23 

0.18" 

0.16 

0.13 

0.12 

0.11 

B 

0.03 

0 . 21 

0.30 

0.60 

0.47 

0.36 

0.25 

0.21 

0.17 

0.15 

0.13 

0.12 

0.11 

95 

A 

1.02 

1.02 

0 . 28 

0.17 

0.11 

0.09 

0.08 

0.07 

0.06 

0.06 

0.06 

0.05 

0.05 

B 

0.02 

0.05 

0 . 21 

0.16 

0.11 

0.09 

0.08 

0 . 07 

0.06 

0.06 

0.06 

0.05 

0.05 

A  =  (MSE  Q  )/(MSE  Q  ),  B  =  (MSE  Q  )/MSE  Q  ) 
n  n  n  n 


TABLE  6.  RATIOS  OF  MEAN  SQUARE  ERRORS  WITH  TRIANGULAR  KERNEL 

LIFE  DISTRIBUTION:  W(2,l)  ,  CENSORING  DISTRIBUTION:  E ( 0 . 4 2 5 ) 

n  =  100  (APPROX.  30%  CENSORING) 


10  a| 

h 

0 . 01 

1.02 

B 

0.82 

25  A 

1.02 

B 

0 . 47 

50  A 

1.03 

B 

0 . 11 

75  A 

1.03 

B 

0.04 

90  A 

1.04 

B 

0.02 

95  A 

1.07 

B 

0.02 

A  = 

0.05  0.11 


1.14 

1.13 

1.34 

1.23 

1.08 

1.09 

1.16 

1.17 

1.07 

1.06 

1.14 

1.13 

1.07 

1.05 

1.13 

1.15 

1 .14 
1.02 

1 . 20 
1.25 

1.23 

0.54 

0 .38 
0.45 

0.15  0.21 


1.41 

1.16 

1.39 

1.07 

1.23 

1.22 

1.34 

1.29 

1.18 

1.18 

1  .  24 
1 . 25 

1.16 

1.22 

1.17 

1.32 

1.18 

1.09 

0.33 
0 . 35 

0  17 
0.20 

0 .10 
0 .11 

0.25  0.31 


1.42 

1.08 

1.56 

1.20 

1.41 

1.32 

1.46 

1.28 

1.28 

1.30 

1.33 

1.37 

1.10 

1.36 

1.33 

1.44 

0 . 19 
0.20 

0.11 

0.11 

0.08 

0.08 

0.06 

0.06 

0.35  0.41 


1.74 

1.34 

2.13 

1.67 

1.45 

1.23 

1.42 

1.17 

1.36 

1.42 

1.39 

1.50 

1.40 

1.31 

0.77 

0.70 

0.08 

0.09 

0.06 

0.07 

0 .05 
0.05 

0.05 

0.05 

0.45  0.51 


2.48 

1.99 

3.02 

2.63 

1.42 

1.15 

1.47 

1.17 

1.39 

1.55 

1 . 36 
1.60 

0.46 

0.44 

0.25 
0 .24 

0.06 

0.06 

0.05 

0.05 

0.04 

0.05 

0.04 

0.04 

0.55  0.61 


3.27 

3.10 

3.16 

3.56 

1.54 

1.22 

1.71 

1.36 

1.45 

1.66 

1.73 

1.73 

0.18 

0.17 

0.12 

0.12 

0.04 

0.04 

0.04 

0.04 

0.04 

0.04 

0.04 

0.04 

(MSE  Q  ) / ( MS E  Q  ) ,  B 


(MSE  Q  )/(MSE  Q  ) 
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TABLE  7.  RATIOS  OF  MEAN  SQUARE  ERRORS  WITH  TRIANGULAR  KERNEL 

LIFE  DISTRIBUTION:  G(2,I)  ,  CENSORING  DISTRIBUTION:  E ( 0 . 4 1 5 ) 

n  <=  100  (APPROX.  50*  CENSORING) 


2 _ 1 

0.01 

0.05 

0.11 

0.15 

0 . 21 

0 . 25 

0 .31 

0.35 

0.41 

0 .45 

0.51 

0.55 

0.61 

0 . 10 

A 

1.04 

1 .15 

1.39 

1 . 58 

1.80 

1.96 

2.23 

2 . 37 

2.36 

2.15 

1.66 

1 . 33 

0.92 

B 

0.78 

1.23 

1  .  41 

1.46 

1.55 

1.69 

2.03 

2.31 

2.69 

2.75 

2.40 

1.98 

1.37 

0.25 

A 

1.02 

1.08 

1.17 

1.23 

1.34 

1.41 

1.54 

1.62 

1.72 

1.78 

1.81 

1.78 

1.60 

B 

0.21 

1.16 

1.25 

1.32 

1.42 

1.49 

1.57 

1.62 

1.71 

1.80 

1.95 

2.06 

2.16 

0.50 

A 

1.01 

1.06 

1.12 

1.16 

1.20 

1.22 

1.21 

1.17 

1.07 

0.97 

0.79 

0.76 

0.88 

B 

0.04 

1.11 

1  .24 

1.30 

1.39 

1  .44 

1.53 

1.58 

1.65 

1.65 

1.63 

1.56 

1.61 

0.75 

A 

1.01 

1.07 

1.14 

1.14 

1.08 

0.99 

1.14 

1.40 

1.55 

1.31 

0.87 

0.65 

0.45 

B 

0.02 

0.69 

1.23 

1.33 

1.32 

1.16 

1.75 

1.73 

1.37 

1.06 

0.70 

0.54 

0.90 

A 

1.05 

1.23 

1.47 

1.85 

1.27 

0.86 

0 . 55 

0.44 

0.34 

0 . 29 

0.25 

o'.  2  3 

B 

0.02 

0  .'31 

0.60 

1.27 

1.04 

0.77 

0.52 

0.43 

0.33 

0 . 29 

0.25 

0.23 

0 . 20 

0.95 

A 

1.06 

1  .26 

0.82 

0 . 50 

0 .32 

0 . 26 

0 . 21 

0 . 19 

0.17 

0.16 

0.15 

0.14 

B 

0.03 

0 .13 

0.82 

0 . 57 

0.36 

0 . 29 

0 .23 

0.21 

0.18 

0.17 

0.15 

0.14 

A  =  (MSE  Q  ) / ( MSE  Q  ),  B  =  (MSE  Q  )/(MSE  Q  ) 
n  n  n  n 


TABLE  8 .  RATIOS  OF  MEAN  SQUARE  ERRORS  WITH  TRIANGULAR  KERNEL 

LIFE  DISTRIBUTION:  L(0,I),  CENSORING  DISTRIBUTION:  E(0.274) 
fl  =  100  (APPROX.  30*  CENSORING) 


should  be  near  one  for  small  h.  This  is  not  the  case  for  Q^,  as 
pointed  out  by  Padgett  (1986). 

With  respect  to  the  kernel  functions,  the  results  are  quite 
similar  for  all  three.  However,  the  bandwidth  value  giving  the 
largest  ratio  of  mean  squared  errors  is  generally  slightly  larger 
for  the  uniform  kernel  than  for  the  triangular  or  quadratic 
kernels,  indicating  that  more  smoothing  is  needed. 

In  all  cases,  the  bandwidth  value  giving  the  largest  ratio  of 
estimated  mean  squared  errors  for  Qn  tends  to  increase  with  p  up 
to  about  p=0.75,  and  then  decrease  for  larger  p.  This  indicates 
that  more  smoothing  is  needed  in  the  middle  of  the  distribution 
than  in  the  tails  to  decrease  the  mean  squared  error  of  the 
estimator. 

Increasing  the  amount  of  censoring  from  30%  to  50%  seems  to 
have  little  effect  on  the  estimated  ratios. 

4.  BANDWIDTH  SELECTION  USING  THE  BOOTSTRAP 

The  simulation  results  of  Section  3  indicate  reasonable 
ranges  of  the  bandwidth  to  use  in  practice,  if  the  forms  of  the 
lifetime  distribution  and  censoring  mechanism  are  known.  However, 
in  general,  the  forms  of  the  distributions  are  not  known;  this  is 
the  reason  for  using  a  nonparametric  estimator.  Since  the 
proposed  estimator  Qn(p)  is  nonparametric,  it  is  desired  to  use  a 
bandwidth  selection  method  which  does  not  require  the  parametric 
assumptions  of  the  results  of  Section  3.  That  is,  given  the 
right-censored  sample  of  size  n,  what  is  an  "optimal"  bandwidth 
value  to  use  in  calculating  Qn(p)?  The  bootstrap  for  censored 
data  provides  a  solution  to  this  problem  for  a  minimum  mean 
squared  error  optimality  criterion. 

It  is  proposed  to  estimate  the  mean  squared  error  of  Qn(p), 
MSE  (Qn(p)),  as  a  function  of  H  (for  fixed  p)  by  the  bootstrap 
method  and  to  choose  the  value  of  h  which  minimizes  the  estimated 
MSE  (Q  (p)).  Let  (x.,&.),  i=l,...,n,  denote  the  observed 
censored  sample.  A  bootstrap  replicate  (x^.,6^),  i=l,...,n,  is 
obtained  by  randomly  drawing  with  replacement  from  the  set  of  n 
bivariate  observations  (x^.,6^).  Note  that  this  simple  resampling 
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scheme  makes  no  use  of  the  estimated  survival  distribution  or 
censoring  distribution,  but  has  been  shown  to  give  the  same 
results  as  the  bootstrap  based  on  a  resampling  scheme  which 
reflects  the  censoring  structure  of  the  data  (see  Efron,  1981). 

Denote  the  product-limit  estimate  of  Q°(p)  based  on  a 
bootstrap  sample  by  Qn(p)  and  let  Qn(p)  denote  the  kernel  estimate 

from  bootstrap  data;  that  is, 

Q*(P)  =  h_1jJ  Q*(t)  K((t-p)/h)dt. 

Based  on  a  large  number,  B,  of  bootstrap  replicates,  an  estimator 
of  the  variance  of  Qn(p)  is 

B  B 

V(h)  =  (B-l)_1{  Z  [Q*i<P) l2  -  [  E  Q*.(P)]2/B),  (4.1) 

i=l  i=l 

and  the  bootstrap  estimate  of  Bias[Qn(p)]  is 

B  (h)  =  B_1  E  Q  ,(p)  -  Q  (p) . 

i=l  ni  n 


Denote  the  bootstrap  estimate  of  the  mean  squared  error  of  Qn(p), 

MSEn  ,  .,  by  MSE*  ,  .. 

Qn(p)  Qn(p) 

In  many  of  the  simulations  described  in  Section  3,  as  with 
most  kernel-type  function  estimators,  for  fixed  p,  as  the  band¬ 
width  h  increased,  the  square  of  the  bias  of  Qn(p)  tended  to 


Thus,  MSEq  (p)(h) 


increase  while  the  variance  tended  to  decrease. 

should  be  a  decreasing  and  then  increasing  function  of  h,  and  the 
bootstrap  estimate,  MSEq  ^(h),  should  yield  a  value  of  h  giving 

an  approximate  minimum  value  of  MSEq  ^^(h).  However,  in  many 

situations  encountered  in  this  study,  MSEq  .  .  (h)  was  strictly 

n'p' 

decreasing  in  h.  This  was  due  to  both  Q  (p)  and  the  estimate  from 

*  n 

a  bootstrap  sample,  Qn(p),  being  oversmoothed,  and  hence  quite 

close  together.  This  resulted  in  a  poor  estimate  of  the  bias  in  a 

2 

"variance-plus-bias  "  estimation  of  the  mean  squared  error  of 
Qn(p).  Therefore,  the  bootstrap  estimate  of  bias  was  modified  by 
using  the  PL  quantile  function  from  (x^,6^),  which  does  not  depend 
on  h,  instead  of  Qn(p).  Hence,  the  bias  estimate  was  modified  to 


L*  >  V*  '  •  ' 


Bo (h)  .  B  Z  Q  ,(p)  -  Q  (p). 

/  i=l  ni  n 

As  h  gets  large  (i.e.  the  kernel  estimate  is  oversmoothed), 

2 

[B^Ch) )  tends  to  increase.  A  bootstrap  estimate  of  MSEq  (p)(h) 
then  obtained  by  n 

MSE*  (p)(h)  =  V(h)  +  B^(h).  (4.2) 

The  value  of  h  is  chosen  to  minimize  (4.2),  yielding  an  estimated 
bandwidth  h(p). 

To  illustrate  the  procedure  and  to  give  an  indication  of  how 
well  it  performs,  a  random  sample  of  size  n=100  was  generated  from 
the  exponential  life  distribution  E(l)  with  E(3/7)  censoring 
distribution  as  in  Table  2.  The  functions  MSE*  .  .(h)  from 

o„<pr 

B=300  bootstrap  samples  at  each  h  and  p  are  shown  in  Figure  1. 

The  triangular  kernel  function  was  used  in  these  calculations. 

The  estimated  ratios  of  mean  squared  errors  from  Table  2  are  shown 
as  functions  of  h  in  Figure  2.  Table  10  shows  the  "best" 
bandwidths  from  Table  2,  hR(p),  and  h(p)  from  Figure  1.  Note  the 
very  close  agreement  of  these  bandwidth  values.  Figure  3  shows 
the  true  quantile  function  Q°  and  the  kernel  estimate  Qn(p)  using 
the  estimates  of  h  as  0.17  (0<p<.2),  0.23  (.2<p<.5),  0  25 
( , 5<p<. 7) ,  0.47  ( . 7 <p<.8) ,  0.11  (.8<p<.95),  and  0.07  (.95<p<1.0). 
In  general,  a  value  of  h(p)  can  be  estimated  for  each  value  of  p 
for  which  Qn(p)  is  to  be  plotted. 

TABLE  10.  Comparison  of  Best  Bandwidths  from 
Figures  1  and  2 


5.  INTERVAL  ESTIMATION 


The  bootstrap  can  also  be  used  to  obtain  interval  estimates 
for  Q°(p).  Assume  that  h  has  been  determined  and  p  is  fixed. 

The  bootstrap  estimate  of  the  standard  error  of  Qn(p)  given 
by  equation  (4.1)  can  be  used  to  define  an  approximate  (l-a)100% 
confidence  interval  for  Q°(p), 

(on(p)  -  zl  a/2  Jv(h),  Qn(p)  +  zi_aJ2  )  ’  <5.!) 

where  2^_a/ 2  t^ie  (l-0^)  percentile  point  of  the  standard 

normal  distribution.  This  interval  requires  no  additional 

bootstrap  calculations  than  those  involved  in  selecting  h.  Efron 

(1986)  has  shown  that  the  symmetric  interval  of  the  form  (0±oz)  is 

"correct"  if  the  statistic  0  has  a  normal  distribution. 

Asymptotic  normality  of  Qn(p)  had  been  established  (Lio,  Padgett, 

and  Yu,  1986),  but  for  small  to  moderately  large  samples  and  p 

near  0  or  1  there  is  some  skewness  in  the  distribution  of  Q  (p). 

n 

Furthermore,  Efron's  results  on  the  validity  of  the  standard 
bootstrap  interval  (0±ca)  refer  to  the  parametric  bootstrap  in 
which  resampling  is  from  the  parametric  MLE  of  the  distribution 
function.  Although  easily  computable,  the  interval  given  in  (5.1) 
may  be  inaccurate,  since  small  sample  skewness  of  the  distribution 
of  Qn(p)  will  not  be  reflected.  Since  the  nonparametric  bootstrap 
is  used  here,  an  interval  which  requires  no  normality  (or 
symmetry)  assumptions  may  be  more  appropriate  in  this  setting  of 
quantile  estimation. 

The  above  discussion  suggests  a  second  approach  based  on  the 

bootstrap  percentile  interval  method.  The  idea  is  to  use  the 
★ 

bootstrap  values  Qn^(p),  to  estimate  the  actual 

distribution,  G,  of  Qn(p).  Given  n  and  p,  the  (Monte  Carlo 
estimate  of  the)  bootstrap  distribution  of  Qn(p)  is  defined  as 

#{Q*i(p)<k) 

G*(k)  =  P*(Qn(p)<k)  =  - g -  .  (5.2) 

While  B  =  300  is  sufficient  to  estimate  standard  errors,  a  larger 
number  of  bootstrap  values  Qn^(P)  is  needed  to  obtain  adequate 
estimates  of  G.  When  B  is  too  small,  the  bootstrap  may  yield  poor 
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estimates  of  the  tail  behavior  of  G.  Generally,  B=1000  is 
considered  large  enough. 

The  percentile  interval  uses  quantiles  of  G  to  estimate  the 
quantiles  of  the  true  distribution  G.  An  approximate  100(l-oc)% 
confidence  interval  for  Q°(p)  is  defined  by 

[G*_1(a),  G*-1(l-a)  ] .  (5.3) 

Note  that  this  interval  requires  additional  computations  to  those 
involved  with  bandwidth  selection,  but  these  calculations  are 
minimal  since  the  value  of  h  has  been  determined. 

As  an  illustration  of  the  methods  presented  in  Section  4  and 
equations  (5.1)  and  (5.3),  consider  the  censored  data  set  of  n=100 
observations  given  in  Table  11.  This  data  set  was  generated  from 
the  exponential  life  and  censoring  distributions  used  in  Table  2 
(that  is,  30%  censoring).  The  values  of  h(p)  were  determined  from 

TABLE  11.  Simulated  Censored  Sample 
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B=300  bootstrap  samples  as  described  in  Section  4  for  p=.10,  .50, 
.75.  For  the  chosen  value  h(p),  the  estimate  Qn(p)  was  calculated 
and  the  bias,  the  standard  error,  and  the  two  approximate  95% 
confidence  intervals  described  above  were  obtained  from  B=1000 
bootstrap  samples.  The  results  are  given  in  Table  12.  Note  that 
the  intervals  from  (5.3)  are  shifted  slightly  from  those  given  by 
(5.1),  indicating  the  skewness  of  the  distribution  of  Qn(p). 

TABLE  12.  Computation  Results  for  Simulated  Data 


p 

Q°(p) 

h(p) 

Qn(P) 

Interval  (5.1) 

Interval  (5.3) 

0.10 

0.1054 

0.17 

0.1462 

(0.0849,0.2076) 

(0.0953,0.2194) 

0.50 

0.6931 

0.23 

0.7440 

(0.5781,0.9099) 

(0.5903,0.9118) 

0.75 

1.3863 

0.49 

1.2605 

(0.9466,1.5744) 

(0.9735,1.5862) 

The  performance  of  the  bootstrap  percentile  interval  (5.3) 

depends  on  how  well  the  bootstrap  distribution  of  Q*(p) 

approximates  the  distribution  of  Q  (p).  Asymptotic  validity  of 

★  ^ 

the  bootstrapped  estimator  Qn(p)  can  be  established. 

First,  note  that 

|Q*(p)  -  Q°(p)  I  <  |Q*(p)  -  Qn(p)|  +  |Qn(p)  -  Q°(P)|. 

For  the  first  term  on  the  right-hand-side,  write 

Q*(p)  -  Qn(p)  =  /J[Q*(t)-Qn(t)]h"1(K((t-p)/h)dt 

=  n~%;Q[q*(t)-q*(p)]h_1K((t-p)/h)dt 

+  n"¥2jJ[qn(p)-qn(  t)  ]h_1K((  t-p)/h)dt 

+  [Q*(p)-Qn(p)];Jh_1K((t-p)/h)dt 

=  n_l/2I1  +  n"1/!I2  +  I3, 

where  q*(t)  =  n%[ Q*( t )-Q°( t ) ]  and  qn(t)  =  n%[ Qn( t )-Q°( t ) ]  denote 

the  bootstrapped  PL  quantile  process  and  the  PL  quantile  process, 

respectively.  Now,  by  Lemma  2  of  Padgett,  Lio,  and  Yu  (1986), 

under  the  conditions  on  h  and  K  stated  in  Section  2  and  if  F  is 

o  '  ° 

continuous  with  density  fQ,  f q ( Q  (p))>0,  if  f  is  continuous,  and 

H(Tp  )<1,  where  Tp  =  sup { t : F  ( t )<1)  ,  |l2  |  -»  0  in  probability  as 
o  o 


! 
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n-*35.  Also,  by  a  proof  similar  to  that  of  Lemma  2  of  Padgett,  Lio, 
and  Yu  (1986)  using  the  results  of  Horvath  and  Yandell  (1986),  it 
can  be  shown  that  |l^  |-0  in  probability  as  n-**>. 

Nov,  under  the  conditions  of  Corollary  2.2  of  Horvath  and 
Yandell  (1986), 

Fn(Q°(p))-p 

|Q*(P)-Qn(p)|  <  - - -  -  [Q°(P)-Q*(P)1 

f0(Q°(p)) 

Fn(Q°(p))-p 

+  - - - [Q°( 

f0(Q°(p)> 

Fn(Q°(p))-F*(Q°(p)) 
f0(Q°(p)) 

n/  i  \%v  n/  -3/4,,  .5/4, 

=  0(n  (log  n)  )  +  0(n  (log  n)  )a.s. 

for  each  p  such  that  F(Q°(p))<l.  Thus,  for  such  p,  |Q*(p)-Q(p)  |->0 

in  probability. 

Finally,  since  |Qn(p)-Q°(p)  |-0  in  probability  (see  Padgett, 
1986),  for  p  so  that  F(Q°(p))<l,  jQ*(p)-Q°(p) |-*)  in  probability. 

Thus,  the  bootstrapped  percentiles  converge  to  the  value 
Q°(p) ,  providing  large  sample  justification  of  the  percentile 
interval.  It  should  be  noted  that  the  bootstrap  convergence 
results  presented  here  refer  to  the  theoretical  bootstrap 
distribution  of  Qn(p)  (when  B=«),  which  in  practice  is  estimated 
by  Monte  Carlo  methods  (with  B=1000)  as  described  earlier. 
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